In this article, we propose a method for computing convolution of large 3D images. The convolution is performed\r\nin a frequency domain using a convolution theorem. The algorithm is accelerated on a graphic card by means of\r\nthe CUDA parallel computing model. Convolution is decomposed in a frequency domain using the decimation in\r\nfrequency algorithm. We pay attention to keeping our approach efficient in terms of both time and memory\r\nconsumption and also in terms of memory transfers between CPU and GPU which have a significant inuence on\r\noverall computational time. We also study the implementation on multiple GPUs and compare the results between\r\nthe multi-GPU and multi-CPU implementations
Loading....